This archive contains the benchmarks used in the conference paper "Multipurpose Cacheing to accelerate OpenMP Target Regions on FPGAs" Abstract: While FPGAs can offer great throughput and energy effi- ciency, when offloading OpenMP target regions to them the memory bandwidth often limits the ability to exploit their potential. As a remedy, our OpenMP-to-FPGA compiler fully automatically inserts optimized multipurpose cache blocks into the generated FPGA hardware. We ex- ploit characteristics of OpenMP target regions to both avoid costly bus snooping hardware and to achieve cache consistency. On a diverse set of benchmarks with data reuse the caches reduce the runtime by 43% on average, while only consuming slightly more FPGA resource
Many algorithms and applications in scientific computing exhibit irregular access patterns as consec...
The memory system remains a major performance bottleneck in modern and future architectures. In this...
FPGAs rely on massive datapath parallelism to accelerate applications even with a low clock frequenc...
Abstract—Developing FPGA implementations with an input specification in a high-level programming lan...
This data set contains the results presented in the paper "Custom Multi-Cache Architectures for Heap...
Abstract—We describe new multi-ported cache designs suit-able for use in FPGA-based processor/parall...
Caches in FPGAs can improve the performance of soft processors and other applications beset by slow ...
drodenas,xavim,eduard,jesus¡ In this paper, we present two approaches to improve the execution of Op...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
ABSTRACT Throughput processing involves using many different contexts or threads to solve multiple p...
Field-programmable gate arrays (FPGAs) often achieve order of magnitude speedups compared to micropr...
Using FPGA-based acceleration of high-performance computing (HPC) applications to reduce energy and ...
Abstract. This paper is motivated by the desire to provide an efficient and scal-able software cache...
The performance gap between CPUs, and memory memory has diverged significantly since the 1980's maki...
This dissertation presents a hardware accelerator that is able to accelerate large (including non-pa...
Many algorithms and applications in scientific computing exhibit irregular access patterns as consec...
The memory system remains a major performance bottleneck in modern and future architectures. In this...
FPGAs rely on massive datapath parallelism to accelerate applications even with a low clock frequenc...
Abstract—Developing FPGA implementations with an input specification in a high-level programming lan...
This data set contains the results presented in the paper "Custom Multi-Cache Architectures for Heap...
Abstract—We describe new multi-ported cache designs suit-able for use in FPGA-based processor/parall...
Caches in FPGAs can improve the performance of soft processors and other applications beset by slow ...
drodenas,xavim,eduard,jesus¡ In this paper, we present two approaches to improve the execution of Op...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
ABSTRACT Throughput processing involves using many different contexts or threads to solve multiple p...
Field-programmable gate arrays (FPGAs) often achieve order of magnitude speedups compared to micropr...
Using FPGA-based acceleration of high-performance computing (HPC) applications to reduce energy and ...
Abstract. This paper is motivated by the desire to provide an efficient and scal-able software cache...
The performance gap between CPUs, and memory memory has diverged significantly since the 1980's maki...
This dissertation presents a hardware accelerator that is able to accelerate large (including non-pa...
Many algorithms and applications in scientific computing exhibit irregular access patterns as consec...
The memory system remains a major performance bottleneck in modern and future architectures. In this...
FPGAs rely on massive datapath parallelism to accelerate applications even with a low clock frequenc...